Least-squares methods for policy iteration

نویسندگان

Lucian Buşoniu

Alessandro Lazaric

Mohammad Ghavamzadeh

Rémi Munos

Robert Babuška

Bart De Schutter

چکیده

Approximate reinforcement learning deals with the essential problem of applying reinforcement learning in large and continuous state-action spaces, by using function approximators to represent the solution. This chapter reviews least-squares methods for policy iteration, an important class of algorithms for approximate reinforcement learning. We discuss three techniques for solving the core, policy evaluation component of policy iteration, called: least-squares temporal difference, least-squares policy evaluation, and Bellman residual minimization. We introduce these techniques starting from their general mathematical principles and detailing them down to fully specified algorithms. We pay attention to online variants of policy iteration, and provide a numerical example highlighting the behavior of representative offline and online methods. For the policy evaluation component as well as for the overall resulting approximate policy iteration, we provide guarantees on the performance obtained asymptotically, as the number of samples processed and iterations executed grows to infinity. We also provide finitesample results, which apply when a finite number of samples and iterations are considered. Finally, we outline several extensions and improvements to the techniques and methods reviewed. Lucian Buşoniu, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos Team SequeL, INRIA Lille-Nord Europe, France, {ion-lucian.busoniu,alessandro.lazaric, mohammad.ghavamzadeh, remi.munos}@inria.fr Robert Babuška, Bart De Schutter Delft Center for Systems and Control, Delft University of Technology, The Netherlands, {r.babuska, b.deschutter}@tudelft.nl This work was performed in part while Lucian Buşoniu was with the Delft Center for Systems and Control.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Least-Squares Policy Iteration: Bias-Variance Trade-off in Control Problems

In the context of large space MDPs with linear value function approximation, we introduce a new approximate version of λ-Policy Iteration (Bertsekas & Ioffe, 1996), a method that generalizes Value Iteration and Policy Iteration with a parameter λ ∈ (0, 1). Our approach, called Least-Squares λ Policy Iteration, generalizes LSPI (Lagoudakis & Parr, 2003) which makes efficient use of training samp...

متن کامل

Policy Iteration for Learning an Exercise Policy for American Options

Options are important financial instruments, whose prices are usually determined by computational methods. Computational finance is a compelling application area for reinforcement learning research, where hard sequential decision making problems abound and have great practical significance. In this paper, we investigate reinforcement learning methods, in particular, least squares policy iterati...

متن کامل

Regularized Policy Iteration

In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use of non-parametric methods with regularization, providing a convenient way to control the complexity of the function approximator. We propose two novel regularized policy iteration algorithms by addingL-regularization to...

متن کامل

Least-Squares Methods in Reinforcement Learning for Control

Least-squares methods have been successfully used for prediction problems in the context of reinforcement learning, but little has been done in extending these methods to control problems. This paper presents an overview of our research efforts in using least-squares techniques for control. In our early attempts, we considered a direct extension of the Least-Squares Temporal Difference (LSTD) a...

متن کامل

Online exploration in least-squares policy iteration

One of the key problems in reinforcement learning is balancing exploration and exploitation. Another is learning and acting in large or even continuous Markov decision processes (MDPs), where compact function approximation has to be used. In this paper, we provide a practical solution to exploring large MDPs by integrating a powerful exploration technique, Rmax, into a state-of-the-art learning...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Least-squares methods for policy iteration

نویسندگان

چکیده

منابع مشابه

Least-Squares Policy Iteration: Bias-Variance Trade-off in Control Problems

Policy Iteration for Learning an Exercise Policy for American Options

Regularized Policy Iteration

Least-Squares Methods in Reinforcement Learning for Control

Online exploration in least-squares policy iteration

عنوان ژورنال:

اشتراک گذاری